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ABSTRACT 

Ordinal  response  variables  often  occur  in  practice.  For  example,  in  clinical  trials  a 
subject’s  response  to  a  drug  regime  might  be  categorized  as  negative,  none,  fair,  or  good. 
There  are  several  common  approaches  to  analyzing  two-sample  ordinal  response  data. 
These  procedures  applied  to  the  same  data  can  lead  to  contradictory  conclusions.  In  an 
attempt  to  reconcile  contradictory  results  and  provide  guidance  to  the  practitioner, 
Kimledorf,  Sampson  and  Whitaker  (1992)  propose  an  alternative  approach.  They  find  the 
scores  which  when  assigned  to  the  levels  of  the  ordinal  response  variable  maximize  a  two- 
sample  test  statistic  and  the  scores  that  minimize  that  same  statistic.  Since  many  of  the 
two-sample  statistics  are  related  by  monotonic  transformations,  these  extreme  scores  are 
in  fact  extreme  scores  for  several  test  statistics.  Both  minimized  and  maximized  test 
statistics  falling  into  the  rejection  region  clearly  indicate  a  difference  between  the  two 
populations  or  treatments.  On  the  other  hand  if  neither  of  the  two  extreme  statistics  fall  in 
the  rejection  region  then  no  matter  what  scores  are  used  there  will  be  no  significant 
difference  in  the  two  populations.  In  this  paper  we  review  the  KSW  procedure  and  its 
implementation  in  SAS®  software. 

1.  INTRODUCTION 

Ordinal  response  variables  often  occur  in  practice.  For  example,  in  clinical  trials  a 
subject’s  response  to  a  drug  regime  might  be  categorized  as  negative,  none,  fair,  or  good. 
There  are  several  common  approaches  to  analyzing  two-sample  ordinal  response  data. 
Among  them  are  assigning  arbitrary  scores  to  the  levels  of  the  ordinal  variable  and  then 
using  a  t-test,  nonparametric  approaches  such  as  Wilcoxen-Mann-Whitney  test  and  the 


Cochran-Mantel-Haensel  test  (Mantel  (1963))  and  the  generalized  linear  model  approach 
with  ordinal  response  variables  (McCullagh  and  Nelder  (1983)).  It  is  common  for 
practitioners  to  try  several  of  these  tests  and  then,  when  results  are  contradictory,  wonder 
which  to  use.  Kimledorf,  Sampson  and  Whitaker  (1992)  propose  an  alternative  approach. 
They  find  the  scores  which  when  assigned  to  the  levels  of  the  ordinal  response  variable 
maximize  a  two-sample  test  statistic  and  the  scores  that  minimize  that  same  statistic.  Since 
many  of  the  two-sample  statistics  are  related  by  monotonic  transformations,  these  extreme 
scores  can  in  fact  be  used  to  find  extreme  test  statistics  for  several  different  two-sample 
tests. 

Let  xi  <  X2  <  ...  <  Xk  (xi  *  xjc)  be  the  nondecreasing  scores  assigned  to  the  levels  of  an 
ordinal  response  variable.  The  KSW  procedure  encompasses  several  of  the  common 
methods.  The  Wilcoxen-Mann-Whitney  statistic  is  a  special  case  of  the  two-sample 
t-statistic  with  marginal  midrank  scores  assigned  to  the  xi, ...,  Xk  (e.g.,  Conover  and  Iman 
(1981)).  The  Cochran-Mantel-Haensel  (CMH)  statistic  is  usually  calculated  using  uniform 
or  equal  spacing  scores  for  the  xj,  ...,  Xk,  marginal  mid-rank  scores  (ridits),  or  modified 
ridit  scores.  The  FREQ  procedure  allows  the  choice  of  these  scores  for  calculating  the 
CMH  statistic  as  well  as  arbitrary  user-provided  scores.  In  addition,  both  the  signed  CMH 
statistic  and  the  two-sample  t-statistic  are  increasing  functions  of  Pearson’s  correlation 
coefficient  p(xi, ...,  xQ  between  the  scores  assigned  to  the  ordinal  variable  and  the  binary 
variable  indicating  whether  the  response  is  from  Treatment  1  or  not 

Thus,  by  finding  the  scores  si,  ...,Sk  which  maximize  p(xi,  ...,Xk)  and  the  scores 
ti,  ...,  tk  which  minimize  p(xi,  ...,  xQ  among  xi  <  X2  <  ...  <  Xk  where  xi  *  Xk,  we  have  also 
found  the  maximum  and  minimum  of  the  two-sample  t-statistic  and  the  CMH  statistic.  If 
both  of  the  extreme  values  of  the  statistic  lie  in  the  rejection  region  then  it  is  clear  that  no 
matter  how  the  levels  of  the  ordinal  response  are  scored,  the  test  statistic  will  be 
significant  When  both  of  the  extreme  values  of  the  test  statistic  fail  to  lie  in  the  rejection 
region  then  the  result  is  also  clear,  no  matter  what  scores  are  assigned  to  the  ordinal 
response  variable,  the  test  statistics  will  always  fail  to  reject  the  null  hypotheses.  In  the 
third  case,  when  the  scores  straddle  a  critical  value,  the  conclusion  becomes  more  difficult 
because  some  non-decreasing  scores  assigned  to  the  data  will  result  in  rejecting  the  null 
hypothesis  and  yet  another  assignment  of  scores  will  result  in  acceptance  of  the  null 
hypothesis. 

In  the  next  chapter  we  outline  the  KSW  procedure  for  finding  the  minimum  and 
maximum  scores  and  present  a  SAS  macro  used  to  implement  this  procedure.  In  Chapter  3 
we  give  a  numerical  example  and  in  Chapter  4  we  provide  a  conclusion. 
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2.  THE  KSW  PROCEDURE  AND  ITS  IMPLEMENTATION 

The  SAS  code  is  a  single  macro.  This  macro  needs  only  base  SAS  software  to  run  and 
is  implemented  within  a  DATA  step.  The  macro  uses  data  in  contingency  table  form,  and 
does  all  the  computations  needed  to  report  the  minimum  and  maximum  scores  and  their 
corresponding  t-statistics,  CMH  statistics,  and  Pearson’s  correlations.  The  complete  code 
is  available  from  the  authors. 

The  two-sample  data  with  scores  xi  <  X2  <  . . .  <  Xk  where  xi  *  Xk  assigned  to  the  levels 
of  the  ordinal  response  variable  can  be  represented  as: 


TREATMENT 

xi 

X2 

«  •  • 

Xk 

TOTAL 

0 

mi 

m2 

... 

mk 

m 

1 

ni 

n2 

... 

nk 

n 

TOTAL 

mi  +  ni 

m2  +  n2 

♦  .  * 

mk  +  nk 

N 

Because  correlation  is  scale  and  location  invariant  we  can,  without  loss  of  generality 
and  for  ease  of  use,  optimize  p(xi,  ...,xk)  over  scores  xi  =0<X2<  ...  <xk=  1.  The 
notion  of  stochastic  ordering  plays  an  important  role  in  the  computations.  The  empirical 
distribution  of  Treatment  1  is  said  to  be  stochastically  greater  than  that  of  Treatment  0  if 

(nj  +  ...  +  nk)/n>  (mj  +  ...  +  mk)/m  (2.1) 

for  j  =  2, ...,  k.  If  the  inequality  (2.1)  is  reversed  then  Treatment  0  is  said  to  be 
stochastically  greater  than  Treatment  1.  If  neither  hold,  then  the  empirical  distributions 
from  the  two  treatments  are  stochastically  incomparable.  For  simplicity,  we  compute  the 
scores  si, ...,  Sk  that  maximize  and  the  scores  ti, ...,  tk  that  minimize  in  three  different 
cases: 

Case  1,  Treatment  0  data  is  stochastically  greater  than  Treatment  1  data, 

Case  2,  Treatment  1  data  is  stochastically  greater  than  Treatment  0  data. 

Case  3,  Treatment  0  and  Treatment  1  data  are  stochastically  incomparable. 

Thus,  the  first  step  in  computation  is  to  decide  in  which  of  the  three  cases  the  data  fall. 

If  the  data  fall  into  case  1,  we  find  the  maximum  scores,  si, ...,  Sk,  by  first  finding  the 
isotonic  regression  yi,  ...,yk  of  njAVnj  +  nj)  with  weights  (mi  +  nj).  There  are  several 
algorithms  for  computing  the  isotonic  regression.  In  the  SAS  macro,  we  use  the  Pool 
Adjacent  Violators  Algorithm  (PAVA)  (see  Robertson,  Dykstra  and  Wright  (1988)).  The 
PAVA  code  is  given  in  the  Appendix.  The  scores  si, ...,  Sk  are  computed  by  re-scaling  the 
isotonic  regression  as  Sj  =  (yi  -  yi)/(yk  -  yi)-  The  minimum  scores  ti, ...,  tk  are  found  by 
computing  p(xi, ...,  Xk)  for  the  k-2  scores  of  the  form  0  =  xi  =  ...=xj  and 
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1  =  x(j+i)  =  •••  =  xk  for  j  =  2,  and  finding  the  one  that  gives  the  smallest  p 

(xl,  -  ..,xk). 

If  the  data  fall  into  case  2  then  finding  the  maximum  scores  si, . ..,  sk  is  similar  to 
finding  the  minimum  scores  in  case  1,  i.e.  the  scores  that  maximize  p(xi, ....  xk)  among 
scores  of  the  form  0  =  x,  =  ...  =  xj  and  1  =X(j+i>=  ...  =xk  for  j  =  2, ...,  k-1.  The 
minimum  score  ti,  ...,  tk  are  found  as  are  the  maximum  score  in  case  1.  Compute  the 
isotonic  regression  yi, ...,  yk  of  mj/(mi  +  n;)  with  weights  (mj  +  nO  and  then  re-scale  to  get 
ti  =  (y.  -  yi)/(yk  -  yi)  for  i  =  l, ...,  k. 

For  case  3,  the  scores  si, ....  sk  are  computed  as  in  easel  and  the  scores  ti, ...,  tk  are 
computed  as  in  case  2.  The  macro  KSW,  implementing  this  procedure  is: 


%*  ***********************  -A:********************. 

%*  Macro  : KSW  ; 

%*  Author : Michael  Whitaker  ; 

%*  Input:  n_lev  =  The  number  of  ordinal  levels; 

**  treatO=The  freq  dist  for  treatment  0; 

%*  treatl=The  freq  dist  for  treatment  1; 

%*  Output : Minscore=  Scores  that  give  min  r 
%*  Maxscore=  Scores  that  give  max  r 

%*  Min_t=  Min  t-statistic 

%*  Max_t=  Max  t-statistic  ; 

%*  Min_r=  Min  Pearson  corr  ; 

%*  Max_r=  Max  Pearson  corr 

%*  Min_CMH=  Min  CMH  Statistic  ; 

%*  Max_CMH=  Max  CMH  Statistic 

%*  Required  Macros  :  PAV,  Stoc_ord,  Cov 
%*  Required  Procs  :  None 

%*  Comments  :  ; 

%*  variables  with  scores,  t ,  r  and  CMH 
%*  statistics  will  always  be  returned 
%*  by  the  macro. 

%* 

%************************+*************+******. 
%macro  ksw ( n_lev= , treat  0= , t reat 1= , minscore=_min_scr 
maxscore=_max_scr_, min_r=_min_r_, 
max__r=_max_r_,  min_t=_min_t_, 
max_t=_max_t_, m i n_cmh = _m i n_c_ , 
max_cmh=_max_c_) ; 

%*; 

%*  Define  the  work  arrays; 

%*; 


array  _w_  {&n_lev}  _TEMPORARY_; 
array  _t0_  {&n_lev}  _TEMPORARY_ 

(0  %do  j  =  1  % t o  %eval(&n_lev-l) ;  ,0  %end;); 
array  _tl_  {&n_lev}  _TEMPORARY_ 

{0  %do  j  =  1  %to  %eval (&n_lev-l ) ;  ,0  %end;); 
array  _y0_  {&n_lev}  _TEMPORARY_; 
array  _yl_  {&n_lev}  .TEMPORARY^; 
array  _z0_  { %eval (&n_lev-l ) , &n_lev}  _TEMPORARY_; 
array  _zl__  { %eval (&n_lev-l ) , &n__lev}  _TEMPORARY_? 
array  _r0_  { %eval (&n_lev-l ) }  __temporary_; 
array  _rl_  {%eval (&n_lev-l) }  ^temporary.; 
array  _cmh0_  {%eval (&n_lev-l) }  _temporary_; 
array  _cmhl_  {%eval (&n_lev-l ) }  _temporary_; 
array  _stt0_  {%eval (&n_lev-l ) }  _temporary_; 
array  _sttl_  {%eval (&n_lev-l ) }  _temporary_; 
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%*; 

%*  Check  for  Stochastic  Ordering  of  the  Empirical; 

%*  Distributions.  The  result  is  placed  in 

%*  the  variable  _case_; 

%*; 

%stoc_ord (popO=&t reatO #  popl=&treatl ,  case=__case_) ; 
%* ; 

%*  Casel:  For  max,  use  Isotonic  Regression; 

%*  For  min,  search  over  scores  of  Os  &  Is; 

%*  Case2 :  For  max.  search  over  scores  of  Os  &  Is; 

%*  For  min,  use  Isotonic  Regression; 

%*  Case3 :  For  both  max  and  min,  use  Isotonic; 

%*  Regression; 

%*; 

select  (__case_)  ; 
when ( 1 ) 
do ; 

%*; 

%*  create  the  yis  from  the  empirical  distribution; 

%*; 


do  _ksw_j_  =  1  to  dim (&t reatO) ; 

_w_ (_ksw_j_J  =  {&treatO(_ksw_j_)+ 

streatl (_ksw_j_) ) ; 

_y0_ (_ksw_j_)  =  &treatO (_ksw_j_) / 

(&treatO  (_ksw_j_)  +&treatl  (_ksw_j_)  ) 

end; 

%*  ; 

%*  find  the  isotonic  regression; 

%*; 

%pav  (max_els=&n_lev,  array=_yO_,  weights=_ w_)  ; 

%*  ; 

%*  Re-scale  to  include  0  and  1; 

%*; 

do  _ksw_j_  =  1  to  dim(_yO_)  ; 

'  _tO_(_ksw_j_)  =  (_yO_(_ksw_j_)  -  __y0_  ( 1 )  )  / 
(_yO_(&n_lev)  -  _ y 0_ { 1 )  )  ; 

end; 

%* ; 

%*  Compute  the  correlation,  the  t,  and 
%*  CMH  for  those  scores; 

%*; 

%cor (popO=&treatO ,  popl=&treatl , 
score=_tO_, r=&min_r, 
t=&min_t , cmh=&min_cmh) ; 

%*; 

%*  copy  these  values  into  the  output  variables; 

%*; 

do  _ksw_k_  =  1  to  dim{&minscore) ; 

kminscore  (_ksw_k_)  =  _t 0_ (_ksw_k_)  ; 
end; 


%*  ; 

%*  This  finishes  the  minimum  score.  ; 

%*  Now,  construct  scores  of  the  form  ; 

%*  0=x  (1)  , ...,  x  { j  )  and  l  =  x  ( j  +1)  , ...,  x  (k)  for  ; 

%*  j=2  , ...,  k-1  ; 

%*  then  pick  the  one  that  gives  the  minimum  ; 
%*  correlation  ; 

%*; 

%*  construct  a  score; 

%* ; 

do  _ksw_J_  =  dim (&t reatO)  to  2  by  -1; 
do  _ksw_k_  =  _ksw_j_  to  dim(&treatl) ? 
_tl_(_ksw_k_)  =  1; 
end; 
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%*; 

%*  compute  the  correlation,  t  and  CMH ; 

%*  ; 

%cor (popO=&treatO ,  popl=&t reatl , 
score=_tl_, r=_r_, t=_stud_t_, 
cmh=_cmh_) ; 

%*  ; 

%*  copy  the  score  and  statistics  into  an  ; 

%*  array  for  later  interrogation; 

%*; 

_rl_  (_ksw_j_- 1 )  =  _r_; 

_sttl_ (_ksw_j_- 1 )  -  _stud_t_; 
_cmhl_(_ksw_j_-l)  =  _cmh_; 
do  __ksw_k_  =  1  to  dim(_tO_)  ; 

_zl_  (_ksw_j_- 1 ,  „ksw_k_)  =  _tl_  (_ksw_k_)  ; 
end; 
end; 

%*  ; 

%*  find  the  score  that  gives  the  max  correlation; 
%*; 

_max_r_  =  - 1 ; 

do  _ksw_k_  =  1  to  dim(_rl_); 

if  (_max_r_  <=  _rl_ (_ksw_k_) )  then 
do ; 

_max_r_  =  _rl_  (_ksw_k_)  ; 

__in__max_  =  _ksw_k_; 
end; 
end; 

%*; 

%*  copy  these  values  to  the  output  variables; 

%*; 

do  _ksw_k_  =  1  to  dim <&maxs core) ; 

sanaxscore  (_ksw_kj  =  _zl_ (_in_max_, _ksw_ 
end; 

&max_r  =  _rl_ (_in_max_) ; 

&max_t  =  _sttO_(_in_max_J  ; 

&max_cmh  =  _cmhO_  (_in__max_)  ; 
end; 
when  (2 ) 
do ; 

%*; 

%*  the  following  is  the  same  as  above  with  the  ; 

%*  roles  of  the  distributions  reversed; 

%* ; 

do  _ksw_j_  =  1  to  dim (&t reatl) ; 

_w_ (_ksw_j_)  =  (&treatO (_ksw_j_ ) + 

streatl (_ksw_j_) ) ; 
_yl_(_ksw_j_)  =  &treatl (_ksw_j_) / 

(ktreatO (_ksw_j_) + 

&treatl (_ksw_j_) ) ? 

end; 

%pav (max_els=&n_lev, array=_yl_, weights=_w_) ; 

do  __ksw_j_  =  1  to  dim(_yl_); 

_tl_(_ksw_j_)  =  ( _ y 1 _ (_ksw_j_)  - 

_yl_(l) ) / (_yl_(&n_lev)  -  _yl_(l)); 

end; 

%cor (popO=&treat 0 ,  popl=&t reatl , 
score=_tl_,  r=&max_r , 
t=&max_t, cmh=&max__cmh).; 

do  _ksw_k_  =  1  to  dim(&maxscore) ; 

&maxscore (_ksw_k_)  =  _tl_(_ksw_k_) ; 
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end; 

do  _ksw_j_  =  dim(&treatO)  to  2  by  -1? 
do  _ksw_k_  =  _ksw_J_  to  dim (&treatO ) ; 
_t  0_ (_ksw_k_)  =  1 ; 
end; 

%cor (popO=&treatO , popl=&treatl , 
score=_tO_, r=_r_, 
t=_stud_t_, cmh=_cmh_) ? 


_r0_ (_ksw_j_-l)  =  _r_; 

_sttO_(_ksw_j_-l)  =  _stud_t_; 

_cmhO_  (_ksw_j_- 1 )  =  _cmh_; 

do  _ksw_k_  =  1  to  dim ( _ 1 0 _ ) ; 

_z0_  (_ksw_j_-l ,  _ksw_k_)  =  _t  0_  (_ksw__k_)  ? 
end; 
end; 

_min_r_  =  1; 

do  _ksw_k_  =  1  to  dim(_rO_) ; 

if  (_min_r_  >=  _r0_ (_ksw_k_) )  then 
do; 

_min_r_  =  _r0_  (_ksw_k_)  ? 

_in_min_  =  _ksw_k_; 
end; 
end; 

do  _ksw_k_  =  1  to  dim (&minscore) ? 

&minscore  (_ksw_k__)  =  _zO_(_in_min_/_ksw_k_) ; 
end; 

&min_r  =  _r0_  (_in_min_)  ? 

&min_t  =  _sttO_ (_in_min_) ; 

&min_cmh  =  _cmhO_(_in_min_) ; 
end; 
when { 3 ) 
do? 

%*? 

%*■  Create  the  y  sub  i  from  the  empirical  distributions; 

%*; 

do  _ksw_j_  =  1  to  dim(&treatO) ? 

_w_ (_ksw_j_)  =  (&treatO (_ksw_j_) +&treatl (_ksw_j_) ) 

_^0_(_ksw_j_)  = 

&treatO (_ksw_j_) / 

(&treatO  {_ksw_j_)  +&treatl  (_ksw_j_) )  ? 

_yl_(_ksw_j_)  =  &treatl (_ksw_j_) / 

(streatO  (_ksw_j_)  +&treatl  (_ksw_j_)  )  ? 
end  ? 

%*; 

%*  Find  the  isotonic  regression? 

%* ; 

%pav  (max_els=&n__lev,  array=_yO_,  weights=_w_)  ? 

%pav  (max_els=&n_lev,  array=_yl_,  weights=_w_)  ? 

%* ; 

%*  Re-scale  to  include  0  and  1; 

%*? 

do  _ksw_j_  =  1  to  dim(_yO_)  ? 

„tO_(_ksw_j_)  =  (_yO_(_ksw_j_)  - 

_yO_{l)  )  /  (_yO_(&n_lev)  -  _yO_(l))? 

_tl_  (_ksw_j_J  =  (_yl_  (_ksw_j_)  - 

_yl_(l)  )  /  («yl_(&n_lev)  -  _yl_(l})? 
end? 

%*? 

%*  compute  the  correlation,  t  and  CMH  for  those  scores? 

%*? 
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%cor {popO=&treat 0 ,  popl=&treat 1 , 
score=_tO_,  r=&min_r, 
t=&min_t , cmh=&min_cmh) ; 

%cor (popO=&treatO ,  popl=& treat  1 , 
score=_t 1_,  r=&max_r, 
t=&max_t ,  cmh=&max__cmh)  ; 

%*; 

%*  copy  these  values  into  the  output  variables? 
%*  ; 

do  __ksw_j_  =  1  to  dim (smaxscore)  ; 

&minscore  (_ksw_j_)  =  _t  0_  (_ksw_j__)  ; 
&maxscore (_ksw_j_)  =  _t 1_ (_ksw_j_) ; 
end; 
end? 

end; 

Drop  _case _ r _ stud_t_ 

_cmh _ ksw_k_ 

_ksw_j__  _in_max_ 

_in_min_; 

%mend  ksw; 


3.  EXAMPLE 

We  illustrate  this  procedure  with  an  example  using  data  from  Agresti  (1984),  where 
two  treatments  are  used  to  try  to  heal  ulcer  craters. 


Treatment 

<2/3 

Healed 

Healed 

A 

12 

10 

4 

6 

B 

5 

8 

8 

11 

The  DATA  step  implementing  the  KSW  procedure  for  this  data  is: 

options  sasautos= ' c : \sugi ' ; 
data  agresti; 
infile  cards? 
array  treatO  {*}  al  -  a4? 
array  treat 1  {*}  bl  -  b4? 
array  minscr  {4}? 
array  maxscr  {4}? 
input  al  -  a4 ; 
input  bl  -  b4? 

%ksw ( treat 0= treat 0 , treat 1= treat 1 , 

n_lev=4 , min_t=min_t  #max_t=max_t , 
maxscore=maxscr, minscore=minscr ) ; 
put  minscr (*)=  Min_t=? 
put  maxscr {*)=  max_t=  ? 
cards ; 

12  10  4  6 
5  8  8  11 

run; 


The  log  for  this  example  is: 

NOTE:  Copyright  (c)  1989-1993  by  SAS  Institute  Inc.,  Cary,  NC,  USA. 
NOTE:  SAS  (r)  Proprietary  Software  Release  6.10  TS019 
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Licensed  to  CTB/MCGRAW-HILL,  Site  0009289001. 


NOTE:  The  SAS  System  for  Microsoft  Windows,  Release  6.10  Limited  Production 

1  options  sasautos='c:\sugi'; 

2  data  agresti; 

3  infile  cards; 

4  array  treatO  {*}  al  -  a4; 

5  array  treatl  {*}  bl  -  b4; 

6  array  minscr  {4}; 

7  array  maxscr  {4}; 

8  input  al  -  a4; 

9  input  bl-b4; 

10  %ksw(treat0=treat0,treatl=treatl,n_lev=4, 

1 1  min_t=min_t,max_t=max_t, 

12  maxscore=maxscr,minscore=minscr); 

13  put  minscr(*)=  Min_t=; 

14  put  maxscr(*)=  max_t=; 

15  cards; 

MINSCR1=0  MINSCR2=0  MINSCR3=0  MINSCR4=1  MIN_T=1. 415 1268421 

MAXSCR1=0  MAXSCR2=0.41 63545568  MAXSCR3=1  MAXSCR4=1 

MAX_T=2.508647573 

NOTE:  The  data  set  WORK.  AGRESTI  has  1  observations  and  22  variables. 

NOTE:  The  DATA  statement  used  6.7  seconds. 

18  ; 

19  run; 

Note  that  there  are  22  variables  in  this  example.  Eight  are  for  the  frequencies,  8  are  the 
extreme  scores,  2  are  t-statistics,  2  are  CMH  statistics,  and  2  are  Pearson’s  correlations. 

The  empirical  distribution  of  ulcer  crater  size  for  Treatment  A  is  stochastically  less 
than  that  for  Treatment  B.  Thus,  the  minimum  scores  are  found  by  searching  through  the 
scores  of  0’s  and  l’s  and  the  maximum  scores  are  found  using  the  PAVA.  The  resulting 
output  gives  the  minimum  score  ti  =  tj  =  t3  =  0  and  U  =  1  with  minimum  t  of  1.42  and  the 
maximum  score  of  si  =  0,  S2  =  .4164,  S3  =  S4  =  1  with  a  corresponding  maximum  t  of 
2.508.  There  are  no  scores  which  will  accept  the  alternative  that  Treatment  A  is  better 
than  Treatment  B.  It  is  clear  that  there  are  some  scores  which  lead  to  rejection  of  the  null 
hypothesis  that  the  two  treatments  are  the  same  and  that  there  are  some  scores  that  fail  to 
reject  the  null  hypothesis  in  favor  of  a  difference  in  the  two  treatments  (or  that  Treatment 
B  is  better  than  Treatment  A).  This  straddling  situation  requires  the  practitioner  to  re¬ 
evaluate  what  differences  in  the  treatments  are  of  practical  significance.  Upon  closer 
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inspection  of  the  minimum  and  maximum  scores,  we  see  that  if  the  practitioner  is 
interested  in  drugs  that  show  any  type  of  improvement  regardless  of  the  degree  of 
improvement  then  the  two  treatments  are  very  similar.  On  the  other  hand,  if  the 
practitioner  is  really  interested  in  completely  or  almost  completely  healing  ulcer  craters 
then  this  data  presents  evidence  that  Treatment  B  is  better  than  Treatment  A. 

4.  CONCLUSION 

The  KSW  procedure  gives  an  approach  for  analyzing  two-sample  ordinal  data.  Most 
methods  either  explicitly  or  tacitly  assign  scores  to  the  levels  of  the  ordinal  variable.  For 
true  ordinal  variables  there  is  no  one  underlying  score  that  adequately  describes  the  levels. 
Thus,  practitioners  often  try  different  scores  or  different  methods,  often  with  conflicting 
results.  KSW  helps  reconcile  these  differences  by  finding  the  scores  which  maximize  and 
the  scores  which  minimize  both  the  CMH  and  the  t-statistic.  In  this  paper,  we  implement 
the  KSW  procedure.  To  enhance  the  portability  of  the  KSW  macro,  the  code  is  written 
using  only  base  SAS  software. 

The  KSW  statistics  should  not  be  thought  of  as  test  statistics.  They  are  extreme  values 
over  a  set  of  test  statistics  generated  from  all  possible  ordinal  scorings  (including  scorings 
that  pool  levels  of  the  ordinal  variable).  Thus,  we  have  purposely  left  p-values  out  of  the 
KSW  macro.  As  was  seen  in  the  ulcer  crater  example,  even  though  there  is  no  distribution 
theory  for  the  KSW  procedure,  both  the  extreme  t-statistics  and  the  corresponding  scores 
provide  a  deeper  insight  into  the  data  than  any  one  of  the  usual  methods  used  alone. 

The  more  general  problem  of  finding  extreme  scores  for  ordinal  response  variables  in 
an  ANOVA  setting  is  treated  in  Gautam  (1991).  Streitberg  and  Roehmel  (1988)  give  a 
method  for  computing  bounds  for  p-values  for  a  class  of  permutation  tests  in  the  two- 
sample  setting.  They  do  not  give  extreme  scores  and  their  algorithm  is  implemented  in 
TESTIMATE. 
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6.  APPENDIX 


<£*******************★*******•*■**•*■****  +  ****★★*** 

%*  Macro  : PAV 

%*  Author : Michael  Whitaker 

%*  Input:  max_els  =  The  max  numb  of  elements 
%*  array  =  The  array  of  data 

%*  weights=  Weights  used  in  the 

%*  regression 

%*  Output: array  =  the  same  array  as  above 
%*  Required  Macros  :  None 
%*  Required  Procs  :  None 
%*  Comments  : 

%*  This  will  perform  an  Isotonic  regression 
%*  in  one  dimension.  The  array  will 
%*  hold  the  processed  data 
%* 

%******************************★************** 
%macro  Pav (max_els= , Array= , Weights=) ; 

%global  index; 

%if  %quote (&index) = 

%then 

%let  index  =  1; 

%else 

%let  index  =  %eval (&index+l) ; 

%let  pooled  =  _pool&index 
%let  parray  =  _parr&index 
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%let  pwghts  =  _pwgtsindex 
array  Spooled  (smax_els)  _temporary_; 
array  spar ray  (&max_els)  _temporary_; 
array  spwghts  (smax_els)  __ tempo rary_; 

If  dim(Sarray)  =  1  then  Go  to  epavsindex? 
do  _pav_J_  s  1  to  dim(Sarray)  ; 
spooled (_pav_j_)  =  0; 

sparray (_pav_j_)  =  0; 

Spwghts  (_pav_j_)  =  0; 
end ; 

sparray (1)  =  sarray (1) ; 
spwghts (1)  =  sweights(l)? 

_pav_j_  =  1; 

Do  _pav_i_  =  2  to  dim (sarray) ? 

If  (sparray (_pav_j_)  >  sarray (_pav_i_) ) 
then 
do ; 

_plwght_  =  Spwghts  (_pav_j_)  +  sweights  (_pav_i_)  ; 

_plval_  =  ( (sparray (_pav_j_) *spwghts (_pav_j_) )  + 

(sarray  (_pav_i_)  *sweights  (_pav_i_)  )  )  / 

_plwght_? 

Spooled (_pav_i_)  =  1; 
if  _pav_j_  >  1  then 
do? 

_pav_ j  _  =  _pav_j_  -  1; 

_pav__j  j_  -  _pav_i__; 

do  while ( (Sparray (_pav_j_)  >  _plval_)  S 
(_pav_i_  >=  1) ) ? 

_tplval_  =  _plval_; 

_tplwgt_  =  _plwght_; 

-  do  until (spooled (_pav_j j_) ) ? 

_pav_jj_  -  _pav_j j_  -  1; 
end?  /*  do  until  */ 

_plwght_  =  spwghts (_pav_J_)  +  _tplwgt_; 

_plval_  =  ( (sparray (_pav_j_) *spwghts (_pav_j_) }  + 

(_tplval_*_tplwgt_) ) /_plwght_; 
spooled (_pav_j j_)  =  1? 

_pav_j_  =  _pav_j_  -  1? 
end?  /*  do  while  */ 

_pav_j_  =  _pav_j_  +  1? 
end;  /*  If  _pav_j_  >  1  */ 

Sparray (_pav_j_)  =  _plval_; 

Spwghts (_p a v_j_J  =  __plwght_; 

end?  /*  (sparray  (_pav_i_)  >  sarray  (_pav_i_J  )  then  */ 

else 

do? 

_pav_j_  =  _pav_j_  +  1? 

sparray  (_pav_j_)  =  sarray (_pav_i_) ? 

Spwghts (_pav_j_)  =  Sweights (_pav_i_) ; 
end? 

end?  /*  _pav_i_  =  2  to  dim (Sarray) ?  */ 

Sarray (1)  =  Sparray(l); 

_pav__j_  =  1? 

_pav_j j_  =  1? 

do  _pav_j_  =  2  to  dim(sarray); 

if  ^spooled (_pav_j_)  then  _pav_jj_  =  _pav_j j_  +  1? 

sarray (_pav_j_)  =  Sparray (_pav_j j_) ; 

end; 

Epavs index: 

drop  _pav_j _ pav_i_  _pav_jj _ plval _ plwght _ tplval_  _tplwgt 

%mend; 

%*****★************★**★***  +  *★***************** ; 

%*  Macro  :Stoc_prd 
%*  Author :Michael  Whitaker 
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%*  Input:  popO  =  The  first  freq  dist  ; 

%*  popl  =  The  second  freq  dist  ; 

%*  Output: case  =  the  case  (1,  2,  or  3) 

%*  Required  Macros  :  None  ; 

%*  Required  Procs  :  None  ; 

%*  Comments  :  ; 

%*  This  will  check  two  empirical  ; 

%*  probability  distributions  for  ; 

%*  stochastic  dominance.  Case=l  is  popl  ; 

%*  is  dominate,  case=2  is  popl  is  ; 

%*  dominate,  and  case  =  3  is  neither 
%*  are  dominate;  ; 

%* 

%*********************************************. 

%macro  Stoc_ord (popO= ,  popl=,  case=) ; 

&c ase= . ; 

_sum_M_  =  0 ; 

__sum_N_  -  0; 

do  _stoc_j_  =  1  to  dim (&pop0 ) ; 

_sum_M_  =  _sum__M_  +  &pop0  (_stoc_j_)  ; 

_sum_N_  =  _sum_N_  +  &popl (_s toc_j_) ; 
end; 

_case_l_  =  1; 

_case_2_  -  1 ; 

Do  _stoc_j_  =  2  to  dim(&popO); 

_psum_M_  =  0 ; 

_psum_n_  =  0 ; 

do  _stoc_k_  =  _stoc_j_  to  dim{&pop0); 

_psum_M_  =  _psum_m_  +  &pop0 (_stoc_k_) ; 

_psum_n_  =  _psum_n_  +  &popl (_stoc_k_) ; 
end; 

_Case_l_  =  (_Case_l_  &  ( ( _ P s um_M_ /  _s  um_jn_ )  >=  (_psum_i 
_Case_2_  =  (_Case_2_  &  ( ( _ P s um_M_ / _s um_m_ )  <=  (_psum_i 
end; 

if  _case_l_  then  &case=l; 
else  if  _case_2_  then  &case=2; 
else  &case=3; 

drop  _psum_m_  _psum_n _ sum_m_  _sum_n _ case_l _ case_2_ 

_stoc_J__; 

%mend  stoc_ord; 


%* ******************************************** 
%*  Macro  :cor 

%*  Author :Michael  Whitaker 
%*  Input:  popO  =  The  first  freq  dist 
%*  popl  =  The  second  freq  dist 

%*  score  =  the  score  to  use 

%*  Output:  r=  the  correlation  statistic 
%*  t  =  the  Student  t 

%*  cmh=  the  cmh  statistic 

%*  Required  Macros  :  None 
%*  Required  Procs  :  None 
%*  Comments  : 

%*  This  will  copute  the  r,  t  and  cmh  for 
%*  popO,  popl  and  score 

%* 

%********************************************* 
%macro  cor (pop0  = , popl=, score= , r= , cmh= , t  = } ; 

%put  &cmh  &r  &t; 

_tmx_  =  0; 

_tnx_  =  0; 

_tmnx2_  =  0 ; 

__tm_  =  0  ; 


l _  /  _sum_n_)  )  )  ; 

/  _sum_n_)  )  )  ; 


_stoc_k_ 
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_tn_  =  0 ; 

do  _cor_k_  =  1  to  dim(&popO); 

_tmx_  =  _tmx_  +  & pop 0 (_cor_k_) 

*  Sescore  (_cor_k_)  ; 

_tnx_  =  _tnx_  +  Scpopl  (_cor_k_) 

*  &score (_cor_k_) ; 

_tmnx2_  =  _tmnx2_  + 

( (&popO (_cor_k_)  +  Scpopl  (__cor_k_)  )  * 

&score(_cor„k_)  **  2); 

_tm_  =  _tm_  +  Sc  pop  0  (_cor_k_)  ; 

_tn_  =  _tn_  +  Scpopl  (_cor_k_)  ; 
end; 

_tt_  =  _tm_  +  _t n_; 

Scr  =  sqrt  (  (_tt_-l )  / _ 1 1. _ )  / 

sqrt(_tt_-l)  *  sqrt (_tm_*_tn_)  * 

( (_tnx_/_tn_)  -  (_tmx_/_tm_) )  / 

sqrt (_tmnx2_  -  ( (_tmx_  +  _tnx_)  **  2  /_tt_) ) ; 
&cmh  =  (_tt_-l )  *  &r  **  2; 

&t  =  sqrt(_tt_ -2)  *  Scr  /  sqrt(l  -  & r  **  2); 

drop  _cor_k_  _tm_  _tn _ tt_ 

_tmx_  _tnx_  _tmnx2_; 

%mend; 


SAS  is  a  registered  trademark  of  SAS  Institute  Inc.  in  the  USA  and  other  countries.  ® 
indicates  USA  registration. 
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